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Abstract. Stealthy attacks are a major threat to cyber security. In prac¬ 
tice, both attackers and defenders have resource constraints that could 
limit their capabilities. Hence, to develop robust defense strategies, a 
promising approach is to utilize game theory to understand the funda¬ 
mental trade-offs involved. Previous works in this direction, however, 
mainly focus on the single-node case without considering strict resource 
constraints. In this paper, a game-theoretic model for protecting a sys¬ 
tem of multiple nodes against stealthy attacks is proposed. We consider 
the practical setting where the frequencies of both attack and defense are 
constrained by limited resources, and an asymmetric feedback structure 
where the attacker can fully observe the states of nodes while largely 
hiding its actions from the defender. We characterize the best response 
strategies for both attacker and defender, and study the Nash Equilibria 
of the game. We further study a sequential game where the defender first 
announces its strategy and the attacker then responds accordingly, and 
design an algorithm that finds a nearly optimal strategy for the defender 
to commit to. 

Keywords: Stealthy Attacks, Resource Constraints, Game Theory 


1 Introduction 

The landscape of cyber security is constantly evolving in response to increas¬ 
ingly sophisticated cyber attacks. In recent years, Advanced Persistent Threats 
(APT) p is becoming a major concern to cyber security. APT attacks have 
several distinguishing properties that render traditional defense mechanism less 
effective. First, they are often launched by incentive driven entities with specific 
targets. Second, they are persistent in achieving the goals, and may involve mul¬ 
tiple stages or continuous operations over a long period of time. Third, they are 

* This work has been funded by QNRF fund NPRP 5-559-2-227 and ARO-W911NF- 
15-1-0277. 
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highly adaptive and stealthy , and often operate in a “low-and-slow” fashion [7] 
to avoid of being detected. In fact, some notorious attacks remained undetected 
for months or longer [2116] . Hence, traditional intrusion detection and prevention 
techniques that target one-shot and known attack types are insufficient in the 
face of long-lasting and stealthy attacks. 

Moreover, since the last decade, it has been increasingly realized that security 
failures in information systems are often caused by the misunderstanding of 
incentives of the entities involved in the system instead of the lack of proper 
technical mechanisms Gang. To this end, game theoretical models have been 
extensively applied to cyber security mioiinamg. Game theory provides 
a proper framework to systematically reason about the strategic behavior of 
each side, and gives insights to the design of cost-effective defense strategies. 
Traditional game models, however, fail to capture the persistent and stealthy 
behavior of advanced attacks. Further, they often model the cost of defense (or 
attack) as part of the utility functions of the players, while ignoring the strict 
resource constraints during the play of the game. For a large system with many 
components, ignoring such constraints can lead to either over-provision or under¬ 
provision of resources and revenue loss. 

In this paper, we study a two-player non-zero-sum game that explicitly mod¬ 
els stealth attacks with resource constraints. We consider a system with N inde¬ 
pendent nodes (or components), an attacker, and a defender. Over a continuous 
time horizon, the attacker (defender) determines when to attack (recapture) a 
node, subject to a unit cost per action that varies over nodes. At any time t, 
a node is either compromised or protected, depending on whether the player 
that makes the last move (i.e., action) towards it before t is the attacker or the 
defender. A player obtains a value for each node under its control per unit time, 
which again may vary over nodes. The total payoff to a player is then the total 
value of the nodes under its control over the entire time horizon minus the total 
cost incurred, and we are interested in the long-term time average payoffs. 

To model stealthy attacks, we assume that the defender gets no feedback 
about the attacker during the game. On the other hand, the defender’s moves 
are fully observable to the attacker. This is a reasonable assumption in many cy¬ 
ber security settings, as the attacker can often observe and learn the defender’s 
behavior before taking actions. Moreover, we explicitly model their resource 
constraints by placing an upper bound on the frequency of moves (over all the 
nodes) for each player. We consider both Nash Equilibrum and Sequential Equi- 
librum for this game model. In the latter case, we assume that the defender is 
the leader that first announces its strategy, and the attacker then responds with 
its best strategy. The sequential setting is often relevant in cyber security, and 
can provide a higher payoff to the defender compared with Nash Equilibrum. 
To simplify the analysis, we assume that the set of nodes are independent in 
the sense that the proper functioning of one node does not depend on other 
nodes, which serves as a first-order approximation of the more general setting of 
interdependent nodes to be considered in our future work. 
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Our model is an extension of the asymmetric version of the Fliplt game con¬ 
sidered in m • The Fliplt game m is a two-player non-zero-sum game recently 
proposed in response to an APT attack towards RSA Data Security 0. In the 
Fliplt game, a single critical resource (a node in our model) is considered. Each 
player obtains control over the resource by “flipping” it subject to a cost. Dur¬ 
ing the play of the game, each player obtains delayed and possibly incomplete 
feedback on the other player’s previous moves. A player’s strategy is then when 
to move over a time horizon, and the solution of the game heavily depends on 
the class of strategies adopted and the feedback structure of the game. In par¬ 
ticular, a full analysis of Nash Equilibria has only been obtained for two special 
cases, when both players employ a periodic strategy [21] , and when the attacker 
is stealthy and the defender is observable as in our model nn. However, both 
works consider a single node and there is no resource constraint. The multi- 
node setting together with the resource constraints impose significant challenges 
in characterizing both Nash and Sequential Equilibria. A different multi-node 
extension of the Fliplt game is considered in m where the attacker needs to 
compromise either all the nodes (AND model) or a single node (OR model) to 
take over a system. However, only preliminary analytic results are provided. 

Our game model can be applied in various settings. One example is key 
rotation. Consider a system with multiple nodes, e.g., multiple communication 
links or multiple servers, that are protected by different keys. From time to 
time, the attacker may compromise some of the keys, e.g., by leveraging zero- 
day vulnerabilities and system specific knowledge, while remaining undetected 
from the defender. A common practice is to periodically generate fresh keys by a 
trusted key-management service, without knowing when they are compromised. 
On the other hand, the attacker can easily detect the expiration of a key (at 
an ignorable cost compared with re-compromising it). Both key rotation and 
compromise incurs a cost, and there is a constraint on the frequency of moves 
at each side. There are other examples where our extension of the Fliplt game 
can be useful, such as password reset and virtual-machine refresh J5J|TB}[2T] . 

We have made following contributions in this paper. 

— We propose a two-player game model with multiple independent nodes, an 
overt defender, and a stealthy attacker where both players have strict re¬ 
source constraints in terms of the frequency of protection/attack actions 
across all the nodes. 

— We prove that the periodic strategy is a best-response strategy for the de¬ 
fender against a non-adaptive i.i.d. strategy of the attacker, and vice versa, 
for general distributions of attack times. 

— For the above pair of strategies, we fully characterize the set of Nash Equi¬ 
libria of our game, and show that there is always one (and maybe more) 
equilibrium, for the case when the attack times are deterministic. 

— We further consider the sequential game with the defender as the leader 
and the attacker as the follower. We design a dynamic programming based 
algorithm that identifies a nearly optimal strategy (in the sense of subgame 
perfect equilibrium) for the defender to commit to. 
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The remainder of this paper is organized as follows. We present our game- 
theoretic model in Section 0 and study best-response strategies of both players 
in Section [3] Analysis of Nash Equilibria of the game is provided in Section [4l and 
the sequential game is studied in Section [5j In Section [6l we present numerical 
result, and we conclude the paper in Section 0 


2 Game Model 

In this section, we discuss our two-player game model including its information 
structure, the action spaces of both attacker and defender, and their payoffs. 
Our game model extends the single node model in m to multiple nodes and 
includes a resource constraint to each player. 


2.1 Basic Model 

In our game-theoretical model, there are two players and N independent nodes@. 
The player who is the lawful user/owner of the N nodes is called the defender, 
while the other player is called the attacker. The game starts at time t = 0 and 
goes to any time t = T. We assume that time is continuous. A player can make 
a move at any time instance subject to a cost per move. At any time t, a node 
is under the control of the player that makes the last move towards the node 
before t (see Figure [TJ. Each attack towards node i incurs a cost of Cf to the 
attacker, and it takes a random period of time Wi to succeed. On the other hand, 
when the defender makes a move to protect node i, which incurs a cost of C [ J , 
node i is recovered immediately even if the attack is still in process. Each node 
i has a value r,; that represents the benefit that the attacker receives from node 
i per unit of time when node i is compromised. 

In addition to the move cost, we introduce a strict resource constraint for each 
player, which is a practical assumption but has been ignored in most prior works 
on security games. In particular, we place an upper bound on the average amount 
of resource that is available to each player at any time (to be formally defined 
below). As typical security games, we assume that Ti,Cf , C 1 / 3 , the distribution 
of Wi, and the budget constraints are all common knowledge of the game, that is, 
they are known to both players. For instance, they can be learned from history 
data and domain knowledge. Without loss of generality, all nodes are assumed 
to be protected at time t = 0. Table |T] summarizes the notations used in the 
paper. 

As in m, we consider an asymmetric feedback model where the attacker’s 
moves are stealthy , while the defenders’ moves are observable. More specifically, 
at any time, the attacker knows the full history of moves by the defender, as well 
as the state of each node, while the defender has no idea about whether a node 
is compromised or not. Let a^k denote the time period the attacker waits from 
the latest time when node i is recovered, to the time when the attacker starts 

3 The terms “components” and “nodes” are interchangeable in this paper. 
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Fig. 1: Game Model 


Table 1: List of Notations 


Symbol 

Meaning 

T 

time horizon 

N 

number of nodes 

Ti 

value per unit of time of compromising node i 

Wi 

attack time for node i 

cf 

attacker’s move cost for node i 

Cf 

defender’s move cost for node i 

Cti,k 

attacker’s waiting time in its k-th move for node i 


time between the (k-l)-th and the k-th defense for node i 

B 

budget to the defender, greater than 0 

M 

budget to the attacker, greater than 0 

mi 

frequency of defenses for node i 

Pi 

probability of immediate attack on node i once it recovers 

Li 

the number of defense moves for node i 


its fc-th attack against node i, which can be a random variable in general. The 
attacker’s action space is then all the possible selections of Since the set 

of nodes are independent, we can assume a t p to be independent across i without 
loss of generality. However, they may be correlated across k in general, as the 
attacker can employ a time-correlated strategy. On the contrary, the defender’s 
strategy is to determine the time intervals between its (k — l)-th move and fc-th 
move for each node i and k, denoted as X,^- 

In this paper, we focus on non-adaptive (but possibly randomized) strategies, 
that is, neither the attacker nor the defender changes its strategy based on feed¬ 
back received during the game. Therefore, the values of and Xi^ can be 
determined by the corresponding player before the game starts. Note that as¬ 
suming non-adaptive strategies is not a limitation for the defender since it does 
not get any feedback during the game anyway. Interestingly, it turns out not to 
be a big limitation on the attacker either. As we will show in Section [3l peri¬ 
odic defense is a best-response strategy against any non-adaptive i.i.d. attacks 
(formally defined in Definition [2]) and vice versa. Note that when the defender’s 
strategy is periodic, the attacker can predict defender’s moves before the game 
starts so there is no need to be adaptive. 
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2.2 Defender’s Problem 


Consider a fixed period of time T and let Li denote the total number of de¬ 
fense moves towards node i during T. Li is a random variable in general. The 
total amount of time when node i is compromised is then T — Y'. k L 1 min(c>!i i fc + 
Wi,X it k)- Moreover, the cost for defending node i is LjCf. The defender’s pay¬ 
off is then defined as the total loss (non-positive) minus the total defense cost 
over all the nodes. Given the attacker’s strategy the defender faces the 

following optimization problem: 


max E 
{X itk },Li 


^ - (t - J2kLi min (cii'k + Wi, X ijfc )) 


• n - L i C i 


D 


N 


i—1 


- B w -p - 1 

2—1 

Li 

Xjj c < T w.p.l Vi 


T 


(1) 


fe=l 


The first constraint requires that the average number of nodes that can be pro¬ 
tected at any time is upper bounded by a constant B. The second constraint 
defines the feasible set of Xi k . Since T is given, the expectation in the objective 
function can be moved into the summation in the numerator. 


2.3 Attacker’s Problem 


We again let L,; denote the total number of defense moves towards node i in T. 
The total cost of attacking i is then -Q 4 , where l Qi , fe <x iifc = 1 

if a^fc < Xi k and l ai k <Xt k = 0 otherwise. It is important to note that when 
cq,fc > Xi } k, the attacker actually gives up its fc-th attack against node i (this 
is possible as the attacker can observe when the defender moves). Given the 
defender’s strategy, the attacker’s problem can be formulated as follows, where 
M is an upper bound on the average number of nodes that the attacker can 
attack at any time instance. 


max 

C*i,k 


s.t. 


E 


E 


T 


LfkLi min(a i)fc + to*, X i<k )) ■ r* 

. 2=1 
' N 

I Vi(t)dt 

. 2=1 


T 


< M 


(Efc=ii 


CX-i.k fa 


)-c{ 


( 2 ) 

where Vi(t) = 1 if the attacker is attacking node i at time t and Vi(t) = 0 
otherwise. Note that we make the assumption that the attacker has to keep 
consuming resources when the attack is in progress instead of making an instan¬ 
taneous move like the defender; hence it has a different form of budget constraint. 
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On the other hand, we assume that Cf captures the total cost for each attack on 
node i, which is independent of the attack time. We further have the following 
equation: 


nT Li 

/ Vi(t)dt = (min(a ijfe + Wi,X i}k ) - min(a ijfc , X iik )) 

Jo k= i 


(3) 


Putting © into ([2]) and moving the expectation inside, the attacker’s problem 
becomes 


_ v- T ■ n - EiZtU min(a ijfc + w it X i}k )] ■ - £[Sfcii P{.ot i}k < X^ k )\ ■ Cf 

max / - 

c*i,k 

2=1 


T 


s t ^2 min(c>!i,fc + Wj, X iik ) - min(q i;fc , -Xj.fc)] < M 


(4) 


3 Best Responses 

In this section, we analyze the best-response strategies for both players. Our 
main result is that when the attacker employs a non-adaptive i.i.d. strategy, a 
periodic strategy is a best response for the defender, and vice versa. To prove 
this result, however, we have provided characterization of best responses in more 
general settings. 


3.1 Defender’s Best Response 

We first show that for the defender’s problem ©, an optimal deterministic 
strategy is also optimal in general. We then provide a sufficient condition for a 
deterministic strategy to be optimal against any non-adaptive attacks. Finally, 
we show that periodic defense is optimal against non-adaptive i.i.d. attacks. 

Lemma 1. Suppose X* k and L* are the optimal solutions of © among all 
deterministic strategies, then they are also optimal among all the strategies in¬ 
cluding both deterministic and randomized strategies. 

Proof. Define Xi tk and li as the realization of Xi )k and Li respectively and let 

C = {[ x i}k k]\ Sili t < B and Sfc=i x i,k < T}. Then, we denote U D (X^ k ,Li) 
as the target function of © and denote 


U D (xi, k ,h) 

N - (t - Y!U E[ min(a ijfc + w u x^ k 


= £ 

2=1 


• n - Wi 


D 


( 5 ) 


T 
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Now, for any [Xj_ k Lf\ € C w.p. 1, we have 
U D (X itk , Li) 

= P(X hk = X* k Li = LI Vi, k) ■ U D (Xl k , L^ 

+ ^2 P( x i,k = x i}k Li = li, Vi, k) ■ U D (xi, k ,li ) (6) 

[x i k ii)ec 

<U°(X* k ,L*) 

The equality holds only when X i)k = X* k Li = L* Vi,k w.p. 1. Therefore, X* k 
and L* are also optimal among all the defender’s strategies. □ 

According to the lemma, it suffices to consider defender’s strategies where 
both Xi tk and Li tk are deterministic. 

Definition 1. For a given Li, we define a set Xi including all deterministic 
defense strategies with the following properties: 

1- Efeli Xi, k = T; 

Fa.i'k+wi[Xi k ) = F oti ^+ Wi (Xij) Vk,j, 

where F ai k+Wi (-) is the CDF of r.v. ai tk + Wi. 

Note that Xi can be an empty set in general due to the randomness of 
&i, k + Wi. The following lemma shows that when Xi is non-empty for all i, any 
strategy that belongs to X, is the defender’s best deterministic strategy against 
a non-adaptive attacker. 

Lemma 2. For any given set of {Li} with Y'^L 1 hfi < B, if ^ 0 Vi, then any 
set of{X itk } which belongs to Xi, is the defender’s best deterministic strategy. 


Proof. We first define the defender’s payoff for node i as 

(t “ Efc=i E[mm(a iik + Wi,X iyk )]j ■ r t - LiC, 


Uf J (X i , k ,L i ) = 


T 


(7) 


Since {Li} are fixed, Problem (JTJ) can be divided into N independent sub¬ 
problems as follows: 

m&xUf(X i:k ) 


X i, 

Li 

s.t. y < t 

k=l 

Take double derivatives of Uf ({Xi tk }) with respect to Xi, we have 

0 \ 


d 2 UP(Xi , k ) 

dx? 


o 

0 _ P^+V-' m ) , 


0 


f&t,n) 


( 8 ) 


(9) 


V 


0 


0 
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where fa itk + Wi {-) is the p.d.f. of r.v. a itk + w t . 

It follows that the objective function is concave because the above matrix is a 
non-positive definite matrix. Here, we assume that F ai k+m (Xi ik ) is continuous. 
However, the concavity can still be proved in general using the subgradient 
concept. Since Uf > {Xi tk ) is concave and continuously differentiable, the KKT 
conditions are both sufficient and necessary. From the KKT conditions, we have 
u *(J2kU X i,k - T) = 0 and F aik+m (X i:k ) = F aiJ+Wi (Xij),\/k,j, where u* 
is the Lagrangian multiplier. It is clear that Uf(Xi tk ) is maximized when the 
constraint is tight, that is, Efc=i X^ k = T. However, there may exist a set of 
Xi, k s.t. Efci 1 Xi,k < T but it is still optimal for Q. Thus, the two conditions 
in Definition [T] are only sufficient. □ 

Lemma [2] gives a sufficient condition for a deterministic defense strategy to 
be optimal. The main idea of the proof is to show that the defender’s payoff 
for each node i is concave with respect to Xi tk . The optimality then follows 
from the KKT conditions. Intuitively, the defender tries to equalize its expected 
loss in each period in a deterministic way, which gives the defender the most 
stable system to avoid a big loss in any particular period. We then show that 
a periodic defense is sufficient when the attacker employs a non-adaptive i.i.d. 
strategy formally defined below. 


Definition 2. A n attack strategy is called non-adaptive i.i.d. if it is non-adaptive, 
and o.i^ k is independent across i and is i.i.d. across k. 

Theorem 1. A periodical strategy is the best response for the defender if the 
attacker employs a non-adaptive i.i.d. strategy. 

Proof. For any fixed {Li}, let Xj = [j-j- ■ • • j-\- It is easy to check that {W} 
satisfies the fist property in Definition |Tj and will satisfy the second property 
if (Xi, k is i.i.d. with respect to k. According to Lemma [21 {Xi} is an optimal 
(deterministic) solution respecting {Li}. It follows that if we let { L *} denote 
the optimal solution of 


"~(t- Efc=i E [ min(oTfc + w u £)]) • n - L t CP 

max 2^- - - 

' 2=1 



2=1 


( 10 ) 


then X* = [jjjj ■ ■ ■ is an optimal solutions to the defender’s problem. 

Hence, a periodic strategy with periods of X*. V'< is a best-response strategy for 
the defender. □ 


According to the theorem, the periodic strategy gives the defender the most 
stable system when the attacker adopts the non-adaptive i.i.d. strategy. Since 
the attacker’s waiting time does not change with time, a fixed defense in¬ 
terval provides the same expected payoff between every two consecutive moves. 
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Moreover, since the defender’s problem is a convex optimization problem, the 
optimal defending frequency for a given attack strategy can be easily determined 
by solving the convex program. 


3.2 Attacker’s Best Response 

We first analyze the attacker’s best response against any deterministic defense 
strategies, then show that the non-adaptive i.i.d. strategy is the best response 
against periodic defense. 

Lemma 3. When defense strategies are deterministic, the attacker’s best re¬ 
sponse (among non-adaptive strategies) must satisfy the following condition 


a 


★ 

i,k 


0 w.p. pi' k 

> Xi'k W.p.l-Pi t k 


( 11 ) 


Proof sketch: The main idea of the proof is to divide the problem (|3J into N 
independent sub-problems, one for each node, where each subproblem has a 
similar target function and a budget Mi with Ylf=i Mi = M, as follows (note 
that we consider an equivalent minimization problem by ignoring the constant 
term n in ©■ 


Li 

k =1 


E[mm(a itk + Wj,Xj^)\ ■ n + P{a^ k < X i<k ) ■ C( 
T 


s y- £ l [min(o'i,fc +w il X i ^)\ - E[ min(q i|fc , 

s ' ' j 1 

fc=l 


< Mi 


( 12 ) 


Each sub-problem is further divided into Li independent sub-problems with 
budget where Ylk*=\Mi,k = Mi. Due to the independence of nodes, it 
suffices to prove the lemma for any of these sub-problems. The detailed proof is 
in Section [5] 

Lemma [3] implies that for each node i, the attacker’s best strategy is to ei¬ 
ther attack node i immediately after it realizes the node’s recovery, or gives up 
the attack until the defender’s next move. There is no incentive for the attacker 
to wait a small amount of time to attack a node before the defender’s next 
move. The constraint M actually determines the probability that the attacker 
will attack immediately. If M is large enough, the attacker will never wait af¬ 
ter defender’s each move. We then find the attacker’s best responses when the 
defender employs the periodic strategy. 

Theorem 2. When the defender employs periodical strategy, the non-adaptive 
i.i.d. strategy is the attacker’s best response among all non-adaptive strategies. 

Proof. If the defender uses periodic strategy where for each i, X,j, = Vfc, all 

s are equal with respect to k. Therefore, setting all pi in m equal such 
that o^/c is i.i.d across i, is one of the best solutions for m for any given Mi. 
Since the set of nodes are independent, the non-adaptive i.i.d. strategy is also a 
best solution for Q when the defender uses periodic strategy. □ 
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3.3 Simplified Optimization Problems 


According to Theorem [I] and Theorem O periodic defense and non-adaptive 
i.i.d. attack can form a pair of best-response strategies with respect to each 
other. Consider such pair of strategies. Let rrii = ^ f = y—, and let pi denote 
the probability that = 0, Vfc. The optimization problems to the defender 
and the attacker can then be simplified as follows. 

Defender’s problem: 


N 

: E 
1 L 


.E[min (i a, —)]pin - Cjf \ ■ mi - p i r l 


N 


s.t. m,i < B 


i=l 


Attacker’s problem: 


N ( 1 

maxV pi • ( ri( 1 - EJ[min(wi, —)] • mi) - Cf 

Pi V 


m. 


i—0 

N 


s.t. i5[min(ir;j, —)1 • mi • Pi < M 

mi 
»=o 


(13) 


(14) 


We observe that the defender’s problem is a continuous convex optimization 
problem (see the discussion in Section 13.11) . and the attacker’s problem is a 
fractional knapsack problem. Therefore, the best response strategy of each side 
can be easily determined. Also, the time period T disappears in both problems. 


4 Nash Equilibria 


In this section, we study the set of Nash Equilibria of the simplified game as 
discussed in Section 13.31 where the defender employs a periodic strategy, and 
the attacker employs a non-adaptive i.i.d. strategy. We further assume that the 
attack time Wi is deterministic for all i. We show that this game always has a 
Nash equilibrium and may have multiple equilibria of different values. 

We first observe that for deterministic Wi, when mi > 4-, the defender’s 
payoff becomes —miC®, which is maximized when rrq = -A. Therefore, it suffices 
to consider m; < A. Thus, the optimization problems to the defender and the 
attacker can be further simplified as follows. 
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For a given p, the defender aims at maximizing its payoff: 

N 

ma x'S^[mi(r i WiPi - Cf) - Pin] 

mi zJ 

i— 1 
N 

s.t. rrii < B 
i= 1 

0 < TOi < —, Vi 

Wi 

On the other hand, for a given m, the attacker aims at maximizing its payoff: 

N 

max Pi [rj - m^nwi + Cf)\ 

Pi Z -' 

2=1 

" ( 16 ) 

Si. y. m i w iPi < M 

i=l 

o <Pt < l,Vi 

For a pair of strategies (m,p), the payoff to the defender is Ud{m,p) = 
J2i=il rn i(Pi r i w i ~ 0°) — while the payoff to the attacker is U a (m,p) = 
Y^iLiPii 1 "* ~~ TUi{riWi + C- 4 )]. A pair of strategies (■ m*,p *) is called a (pure 
strategy) Nash Equilibrium (NE) if for any pair of strategies (m,p), we have 
Ud(m*,p*) > Ud{m,p*) and U a (m*,p*) > U a {rn*,p). In the following, we as¬ 
sume that Cf > 0 and Cf > 0. The cases where Cf = 0 or Cf = 0 or 
both exhibit slightly different structures, but can be analyzed using the same 

C D 

approach. Without loss of generality, we assume n > 0 and < 1 for all i. 
Note that if r, = 0, then node i can be safely excluded from the game, while if 
—> 1, the coefficient of rrii in Ud (defined below) is always negative and there 
is no need to protect node i. 

Let pi(p) = PiTiWi — Cf denote the coefficient of rrii in Ud, and pi(m) = 

r, ~ rn 'm'w' +Ci ^ Not e that for a given p, the defender tends to protect more a 
component with higher pfp), while for a given m, the attacker will attack a 
component more frequently with higher pi (rn). When m and p are clear from 
the context, we simply let /!,; and pi denote Pi{p) and pi(m), respectively. 

To find the set of NEs of our game, a key observation is that if there is 
a full allocation of defense budget B to m such that pfm) is a constant for 
all i, any full allocation of the attack budget M gives the attacker the same 
payoff. Among these allocations, if there is further an assignment of p such 
that Pi{p) is a constant for all i, then the defender also has no incentive to 
deviate from m; hence ( m,p) forms an NE. The main challenge, however, is 
that such an assignment of p does not always exist for the whole set of nodes. 
Moreover, there are NEs that do not fully utilize the defense or attack budget 
as we show below. To characterize the set of NEs, we first prove the following 
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properties satisfied by any NE of the game. For a given strategy ( m,p ), we 
define p*(p) = maxj/r^p), p*(m) = min i p l (m), F(p ) = {* : pi(p) = p*(p)}, and 
D{m,p) = {i £ F : pi(m ) = p*(m)}. We omit m and p when they are clear from 
the context. 

qD 

Lemma 4. In any NE, (1) rrn< r . w r .' +C A and h 2 / Pi ^ 7^1 ■ 

Proof. To prove the first property, suppose nrii > r w ri +c A ■ Then pi must be 0; 
otherwise the benefit for attacking i becomes negative. This in turn implies that 
Hii = 0 by the assumption that Cf > 0, a contradiction. To prove the second 

(JP 

property, suppose pi < —. Then we have Pi < 0, which implies mi = 0 and 
therefore Pi = 1 since r, > 0, a contradiction. □ 


Lemma 5. If (m,p) is an NE, we have (see Tabled : 


1. Mi ^ F, nm = 0 ,pi = 1 ,Pi = oo; 

2. Mi £ F\D, mi £ [0, C A. ],Pi = 1/ 


, 9 . MiG n. 


Proof. We first show that if nii > 0 and nij > 0, then pi = pj. Suppose pi < pj. 
Then it is better to protect i than protecting j. Since rrii > 0, we must have 
= w~ > r wi+c A ^ the assumption that Cf > 0, a contradiction. It follows 
that rrii = 0 Mi ^ F and mi £ [0, r . u ^f c A ] V* £ F. Since when mj = 0, we must 
have pi = oo, and pi = 1, pi = 1, pi = oo Vi ^ F. It remains to show that Pi = 1 
for all i £ F\D. Assuming F\D ^ 0, then we have pj < oo for j £ D, which 
implies that mj > 0 for j £ D. Since pi < p* for i £ F\D, it is more beneficial 
to attack i that any j £ D. Since pj > 0 and rrij > 0 for j £ D, we must have 
Pi = 1- □ 


Lemma 6. If (rri, p) forms an NE, then for i £ D,j £ F\D and k fL F, we have 
TiWi - Cf > rjWj - Cf > r k w k - C k . 

Proof. Since pi = pj for i £ D and j £ F\D by the definitions of F and D, and 
Pi < Pj = 1 by Lemma [5j we have riWi — Cf > pi = pj > rjWj — C r d. On the 
other hand, since pj > p k by the definition of F, and pj = p k = 1 by Lemma [5J 
we have rjWj — C® = pj > p k = r k Wk — Cjf. □ 

According to the above lemma, to find all the equilibria of the game, it suffices 
to sort all the nodes by a non-increasing order of r^Wi — Cf, and consider each 
Fh consisting of the first h nodes such that rhWh — C k > rh+iWh+i — C}( +1 , 
and each subset D k C Fh consisting of the first k < h nodes in the list. In the 
following, we assume such an ordering of nodes. Consider a given pair of F and 
DCF. By Lemma [5] and the definitions of F and D , the following conditions 
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are satisfied by any NE with F(p) = F and D(m,p ) = D. 


rrii = 0,pi = l,Vz £ F- (17) 

mt€[ 0,- - s ],p i = l,Wi&F\D-, (18) 

WiTi + Cf 

mi £ [0,- r ! € [——, !]. Vi € D; (19) 

Win + C f nwi 

H X) miWiPt < M; (20) 

iGF iGF 

Pi=p*,VieF; iM<p*,Vi£F-, (21) 

Pi = p* ,\/i £ D; pi> p* ,Vi ^ £>. (22) 


Table 2: Necessary Conditions for NEs 


i £ 

D 

F\D 

F 

TYli 

rn r i 1 

rn ri 1 

n 

WiTi+Cf' 

L ’ wiri+Cf 1 


Pi 

[—,1] 
L FiWi ’ J 

1 

i 


* 

p 

P* 

* 

< p 

pi 

p* 

>P* 

+oo 


The following theorem provides a full characterization of the set of NEs of 
the game. 


Theorem 3. Any pair of strategies ( m,p ) with F(p) = F and D(m,p) = D is 
an NE iff it is a solution to one of the following sets of constraints in addition 
to O to (HU). 


EiGF 

TOi 

= 

B; 

P * 

= 

0; 




EiGF 

77li 

= 

B; 

P* 

> 

0; 

E 

ieF miWiPi = 

= M; 

a 1 - Ei £ F 

TOi 

= 

B; 

P * 

> 

0; 

Pi 

= l,Vi € F; 


EieF 

rrii 

< 

B; 

P* 

= 

0; 

F 

= Fn; p* = 

0; 

5- EiGF 

mi 

< 

B; 

* 

P 

= 

0; 

F 

= F n ; p* > 

0.' X^iGF rn i‘ w iPi = 

EiGF 

mi 

< 

B; 

* 

P 

= 

0; 

F 

= F n ; p* > 

0; ft = 1, Vi £ F. 


Proof. We first consider the cases when the budget constraint of the defender is 
tight, i.e., m i = B (cases 1-3). Since mi < W . r r + C A in any NE by Lemma|T] 
and wii = 0 for i not in F by Lemma [5j we must have B < X),; 6 f tt , r r +c- 4 
in any NE. If B = X,gf w-u+c a ’ we ^ ave P* = 0 (case 1). Assume B < 
y'.vr- r * „a . First consider the case D = F. We then have mi < - r \ „ A , i £ 

WiFi+Ci — WiFi+Ci ’ 

F. Hence, p* > 0 since B < XEf w r'+c A • ^ follows that J2ieF TTliWi P i = ^ 
(case 2) unless pi = l,Vi £ F (case 3); otherwise, some pi,i £ F can be in¬ 
creased to get more benefit. Note that case 3 can happen only if r,;«y — C[ J is 
the same for all i £ F. Next consider the case D C F. If B £ [EieF ’ 
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Y^i£F ■wr' r +c A )’ we a S a i n have p* = 0 and get case 1, but with extra con¬ 
straints regarding i £ F\D as required by (fI51) and (??). Otherwise, if B < 
w r r +c A 1 a PPlyi n § a similar argument as above, we again have p* > 0 
and get case 2 or case 3 depending on whether the attacker’s budget constraint 
is tight or not. 

We next consider the cases when Jf ieF mi < (cases 4-6). We first observe 

qD 

that pi = , Vi £ F, or equivalently, p* = 0. Otherwise, if pi > 0, rrii can be 

further increased to reduce the cost due to the fact that rrn < - r \ „ A < — 

in any NE (by Lemma 0] and the assumption that Cf > 0), a contradiction. 
We then have F = Fn by its definition. Cases 4-6 then follow from a similar 
argument for cases 1-3 by distinguishing different values of p*. □ 

In the following, NEs that fall into each of the six cases considered above are 
named as Type 1 - Type 6 NEs, respectively. The next theorem shows that our 
game has at least one equilibrium and may have more than one NE. 

Theorem 4. The attacker-defender game always has a pure strategy Nash Equi¬ 
librium, and may have more than one NE of different payoffs to the defender. 

Proof. To show the first part, for any given index h < N, we define a pair of 
strategies ( m h ,p h ) as follows. Let = 0,Vi > h and let < h} be the 

solution to the constraints (1) Yli<h m i = an d (2) Pi is a constant for all i < h: 

Pi = l,Vi > h (hence p h+1 = r h+1 w h+1 - Cj?), and Vi < h,p^ = if 

h < N, p^ = 0 otherwise. 

We first prove the following claim. For a given h , let h' < h denote the 
smallest index such that rh'Wh' — C$ = rhWh — C®. Consider two pairs of 
strategies ( m h ,p h ) and ( m h ,p h ). We claim that if J2i<h m iWiPi < M and 
w iPi > Tf, then there is a Type 2 NE respecting Fh- Note that by 
definition, Y^i<h m i w iPi < M is always true when h = N. 

To prove the claim, we consider another pair of strategies (m h ,p h ). If we 
have J2i< h m i u ’iPi > M, then since J2i<h ml i w iPi < M, there must exist p 

with pi = 1, Vi > h, p h £ , 1], and pi = ,Vi < h such that 

^2 i <h rn i w iPi = M. Hence, ( m h ,p ) is a Type 2 NE. On the other hand, if 
J2 i<h miWiPi < M, then since J2 i<h mff Wip > M, there must exist m with 
irii = 0, Vi > h, mi £ [0,m*],V/i' < i < h, and {m!f,i < h} be the solution to 
the constraints (1) Yhi<h m!f = B and (2) pi is a constant for all i < h\ such 

that Y^i<h miWiPi = M. We again get a Type 2 NE. 

We then prove the theorem. First note that if B > J2 i<N w . r r +c A > then there 

is a Type 1 or Type 4 NE in F^. Assume B < Y^i<N trr Xc A ■ There is h < N 
such that B < Jfi< h Wi rXc A and B - E i<h' Wi rXc A ’ where h ' is defined as 
above. If there is an NE with respect to some F(h"),h" > h, we are done. 
Otherwise, we have Yli< h m^Wip^ < M by the claim. If Jf i<h mffwip^ > M, 
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there is a Type 2 NE as proved above. Otherwise, consider the pair of strategies 
(■ m' ,p h ) where to- = 0 ,Vi > h, to* = w . r r \ C A i Vi < ft', and {m/,i < fi} is the 

solution to the constraints (1) to/ = I? and (2) pi is a constant for all 

i < ft-'. If m'iWiPi > M, there is Type 2 NE. Otherwise, there must be a 
Type 1 NE. 

To show the second part, consider the following example with two nodes 
where rq = rq = l,uq = 2,W2 = 1, Cf* = 1/5= 4/5, Cj 4 = 1,(7^ = 
7/2,13 = 1/3, and M = 1/5. It is easy to check that to = (1/6,1/6) and 
p = (3/20,9/10) is a Type 2 NE, and m = (1/3,0) and p = (pi,l) with p\ £ 
[1/5,3/10] are all Type 1 NEs, and all these NEs have different payoffs to the 
defender. □ 


5 Sequential Game 


In this section, we study a sequential version of the simplified game considered 
in the last section. In the simultaneous game we considered in the previous 
section, neither the defender nor the attacker can learn the opponent’s strategy 
in advance. While this is a reasonable assumption for the defender, an advanced 
attacker can often observe and learn defender’s strategy before launching attacks. 
It therefore makes sense to consider the setting where the defender first commits 
to a strategy and makes it public, the attacker then responds accordingly. Such a 
sequential game can actually provide defender higher payoff comparing to a Nash 
Equilibrium since it gives the defender the opportunity of deterring the attacker 
from moving. We again focus on non-adaptive strategies, and further assume 
that at t = 0, the leader (defender) has determined its strategy, and the follower 
(attacker) has learned the defender’s strategy and determined its own strategy in 
response. In addition, the players do not change their strategies thereafter. Our 
objective is to identify the best sequential strategy for the defender to commit 
to, in the sense of subgame perfect equilibrium m defined as follows. We again 
focus on the case where Wt is deterministic for all i. 


Definition 3. A pair of strategies ( m*,p*) is a subgame perfect equilibrium of 
the simplified game and m if m* is the optimal solution of 

N 

ma x'S~'[m i (r i WiPi - C F) -P* r i] 

rrii z 
i— 1 

N 

s.t. mi < B 
i=l 

0 < TOj < —, Vi 

Wi 


(23) 
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where p* is the optimal solution of 

N 

maxV pi[ri - m^nwi + Cf)\ 

Pi 

n ( 24 ) 

S.t. y. m i w iPi < M 
»= 1 

0<Pi< l,Vi 

Note that in a subgame perfect equilibrium, p* is still the optimal solution 
of (USD as in a Nash Equilibrium. However, defender’s best strategy m * is not 
necessarily optimal with respect to (fl5l) . Due to the multi-node setting and 
the resource constraints, it is very challenging to identify an exact subgame 
perfect equilibrium strategy for the defender. To this end, we propose a dynamic 
programming based algorithm that finds a nearly optimal defense strategy. 

Remark 1. Since for any given defense strategy {to,;}, the attacker’s problem 
(1241) is a fractional knapsack problem, the optimal pi, Vi has the following form: 

Sort the set of nodes by pifrui) = r ' ^ non-increasingly, then there is 

an index k such that Pi = 1 for the first k nodes, and pi < 1 for the k+ 1-th node, 
and Pi = 0 for the rest nodes. However, if pi = pj for some i ^ j, the optimal 
attack strategy is not unique. When this happens, we assume that the attacker 
always breaks ties in favor of the defender, a common practice in Stackelberg 
security games m- 

Before we present our algorithm to the problem, we first establish the fol¬ 
lowing structural properties on the subgame perfect equilibria of the game. 

Lemma 7. In any subgame perfect equilibrium (m,p), the set of nodes can be 
partitioned into the following four disjoint sets according to the attack and de¬ 
fense strategies applied: 

1. F = {i\mi > 0, pi = 1} 

2. D = {i\mi >0, 0 < pi < 1}; 

3. E = {i\irn > 0, pi = 0}; 

4- G = {i\mi = 0 , Pi= 1}. 

Moreover, they satisfy the following properties: 

1. F Li D U E U G = {i\i = 1, and \D\ < 1 

2. Pi> Pk> Pj for Vi e F, k G D, j G E 

Proof. It is obvious that F , D , E and G are disjoint. The three properties follow 
directly from the structure of the optimal solution to the attacker’s problem and 
the remark made above. □ 

Since the set D has at most one element, we use md to represent mi,i G D 
for simplicity, and let pd = p(md). If D is empty, we pick any node i in F with 
minimum pi and treat it as a node in D. 
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Lemma 8. For any given nonnegative pd, the optimal solution for \23\) - \24V 
satisfy the following properties: 

1. riWi — Cf >OVi(zFLiELiD 

2. rrij <rnt Vi G F 

3. mj = rrij Vj G E 
f. rrij < — Vi 

5. B - ~m d > 0- 

where rr%i = mj(p d ) and mj(-) is the reverse function of pj(-) 

Proof. If rjWi — C[ J < 0 Vi G F \J E \J D. there is no point for the defender 
to defend such node which will only make the payoff even worse due to high 
defending cost. Thus, all the nodes whose rjWj — Cf < 0 are only in set G. 
Then, for Vi G F, piffrii) = p d and pi(mj) > p d . According to the reverse 
relationship between p and m,, we have m, < to,. For Vj G E , since pjijrij) = p d 
and p 3 (rrij ) < p d , rrij is actually a lower bound for mj. Setting irij = rrij makes 
the cost from node i, which is miCf gets its minimum and so does the whole 
problem since it also uses the minimum budget from B. Therefore, more budget 
can be allocated for m, i G F to minimize the cost from the nodes in set F. 
Further, it’s easy to check rhj is always less than for any given nonnegative 
Pd- As to the 5th property, if B — — rn d < 0, there is no budget for 

nodes in set F and D , which means F and D are both empty. According to the 
greedy method, it only happens when M = 0 which violates our assumption. 
Therefore, B — ~ m d > 0- □ 

Remark 2. If p d < 0, the defender can give less budget to the corresponding 
node to bring p d down to 0. In any case, the payoffs from nodes in set D and E 
are 0 since the attacker will give up attacking the nodes in set D and E. Thus, 
the defender has more budget to defend the nodes in set F and G which brings 
him more payoffs. Therefore we only need to consider nonnegative p d . 

Lemma 9. There exists an optimal solution vector with at most min{2, n } frac¬ 
tional values in 2-D fractional knapsack problem, where n is the number of vari¬ 
ables in the knapsack problem. 

A proof of this result can be found in Chapter 9 of T2]. 


Lemma 10. For any nonnegative p d , there exists an optimal solution for \23]) - 
I24\ ) such that Vi G F, there are at most two mj < rrii and all the other mj = rr%i 

Proof. Suppose the set allocation and p d are fixed, which means m d and Vi 
are also fixed. According to Lemma [3 and Lemma [8l we can now convert (I23D - 
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(I^H) to the following problem: 

min ^[g(1 - rriiWi) + miCf] + y~Vj + 

*’ ieF ieG ieE 

+ pr d (l - w d m d ) + m d C d 

s.t. 22 rrii < B — '22 rrii — m d 
ieF ieE 

22 Winn + pw d m d < M 
ieF 

0 < TO; < TO, V* S F 


(25) 


, . fl M — 22i(zF' w i rn i'\ 

where p = min-{ 1,-—- 

. L Wdmd J 

It is important to note that, if p = 1, it means all the nodes are in set F. 
Otherwise, such ( m,p ) will violates the structure of the greedy method’s solution 
of (l24l) . Then, (l25l) becomes a 2-D fractional knapsack problem with variables 
According to Lemma [9] there exists an optimal solution with at most 
two fractional variables which means at most two To; < to*. 

If p = —— ~ - we can P u t P back into the target function of (PZHll and 
convert it to 


mm 


rrii,i£F 


in ^[r,(l - rriiWi ) + m i Cf > ] + 22 r i + 22 


ieF 


ieG ieE 


M - Y,ieF w i m i 


w d m d 

s.t. 22 m i — b — ^ 2 — TOd 

ieF ieE 

o < TO, < TO, v* e f 


r d { 1 - w d m d ) + m d C d 


D 


(26) 


It is easy to see that (12B1) is a fractional knapsack problem. Thus, there is at 
most one fractional variable which means at most one rrii < to,. □ 

From the above lemmas, we can establish the following results about the 
structure of the optimal solution for (E3l) - (lMll . 

Proposition 1. For any nonnegative p d , there exists an optimal solution {to;}™ =1 
such that 

1. Vi £ F, there are at most two mi < rrii and all the other mi = rrii; 

2. m d = Wi d 

3. Vi € E, mi = rrii; 

4- Vi £ G, to; = 0. 

According to Proposition [TJ for any nonnegative p d , once the set allocation 
is determined, the value of mi can be immediately determined for all the nodes 
except the two fractional nodes in set F. Further, for the two fractional nodes, 
their nii can be found using linear programming as discussed below. From these 
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observations, we can convert (El-® to ® for any given nonnegative pd , d, 
fi and f 2 . 


_ m ™ aX B p q Y ^ ^ - C ?)- r i]+Y \ m fi H W fi - C f j )- r ^ 

f1 ’ /2 ’ ’ ’ ieF\{f u f a } j =1 

~Y ri ~Y + m d{pr dWd -Cd) - pr d 

ieG i£E 

s.t. mj+rrifi + m/ 2 + rfq + m^ < B 

i€F\{/!,/ 2 } ieB 

w l rn i + Wf 1 mf 1 +Wf 2 mf 2 +pwdmd<M 

ieF\{hJ 2 } 


0 < m h < mi, 0 < to/ 2 < m 2 , 0 < p < 1 


(27) 

Note that, the set allocation is part of the decision variables in (071) . 

We then propose the following algorithm to the defender’s problem (see Al¬ 
gorithm [T|. The algorithm iterates over nonnegative pd (with a step size p a tep) 
(lines 3-10). For each pd, it iterates over all possible node d in set D, and all possi¬ 
ble nodes /i, f 2 with fractional assignment in set F (lines 5-8). Given pd, d, f±, f 2 , 
the best set allocation (together with m* for all i and p) are determined using 
dynamic programming as explained below (lines 6-7), where we first assume that 
B, M, rrii and Wi have been rounded to integers for all i . The loss of performance 
due to rounding will be discussed later. 

Consider any pd, node d is in set D, and nodes f±, f 2 with frictional as¬ 
signment in set F. Let SEQ(i,b,m, d, f\, f 2 ,ind) denote the maximum payoff 
of the defender considering only node 1 to node i (excluding nodes d, fi and 
f 2 ), for given budgets b and m for the two constraints in (1271) . respectively. 
The ind is a boolean variable that indicates whether the second constraint 
of (1271) is tight for node 1 to i. If ind is True, it means all the budget m is 
used up for node 1 to i. ind is False meaning that there is still budget m 
available for the attacker. Here, 0 < b < B and 0 < m < M. The value 
of SEQ(i,b,m,d, fi, f 2 ,ind) is determined recursively as follows. If b < 0 or 
m < 0, the value is set to — oo. If node i is one of d, f\ and f 2 , we simply set 
SEQ{i, b, m, d, fi, f 2 , ind) = SEQ(i — 1, b, to, d, f±, f 2 , ind). Otherwise, we have 
the following recurrence equation, where the three cases refer to the maximum 
payoff when putting nodes i in set F, E, and G, respectively. 


SEQ{i, b, to, d, fi, f 2 , ind) 

= ma x^SEQ(i — 1,6 — to^to - Wirrii, d, fi, f 2 ,ind) +Wii(riWi - Cf) - r,, 

SEQ{i — 1, b — rrii, m, d, f±, f 2 , ind) — THiC®, SEQ{i — 1, b, to, d, fi, f 2 , ind) — ri 

(28) 

Meanwhile, if ind is False, node i can be allocated to set E only if ri — mi(riWi + 
Cf') < 0. Otherwise, there is still available budget for the attacker to attack other 
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nodes with reward greater than 0 which violates the structure of the greedy 
solution for (EH)- Also, if ind is False, it means m is not used up. Thus we 
should return — oo if ind is False, i > 0 and m = 0. 

Moreover, we let SEQ(Q,b,m,d, fi, f 2 ,ind) denote the maximum defense 
payoff when only nodes in d, fi, and /2 are considered. If ind is True, the 
following linear program in (1291) determines the optimal values of p, m and 
uif 2 for given budgets b and m: 

2 

max V ][m f ( r fj w fj - Cf ) - r f ] + m d (pr d w d - C%) - pr d 

2 3 =1 

s.t. nif, + mn + m d < b 

! . (29) 

m h w h +m h w f2 <m 

m h <m h , m h <m h 

= 171 ~ m h w h - m f 2 Wf2 < x 
w d m d ~ 

If ind is False, we must have p = 1. The optimal values of m/j and m/ 2 are 
determined by d30D: 

2 

max Y'imfj - C?) - r f ] + m d (r d w d - C$) - r d 

rrif. ,m LJ J3 

% 2 3 =1 

s.t. mf 1 + nif 2 + m d < b 

m fi w h + mf 2 Wf 2 <m — w d m d 
m h < m Sl , m h < m h 


Algorithm 1 Sequential Strategy for Defender 
1: Initialize p s tep 

2- Pmax min{p : Yh=i < M } 

3: for p d <r- 0 to Pmax with step size p step do 
4: rf%i <r- mi(p d ) for all i 

5: for d, fi, /2 <— 1 to n do 

6: vald,f lt f 2 <- SEQ(n,B,M,d,fi,f 2 ,True) 

7: val dj 1 j 2 SEQ(n,B,M,d,fi, f 2 , False) 

8: end for 

9: C dp (p d ) «- ma ^,h,f 2 {vald,h,hivcd' dtfuh } 

10: end for 

11: C alg TOaaXpAGdpiPd)} 

Since the dynamic program searches for all the possible solutions that satisfy 
Proposition [I] C dp (p d ) gives us the optimal solution of (l23l) - (l24l) for any given 
nonnegative p d . Algorithm [T] then computes the optimal solution by searching 
all the nonnegative p d . Note that d, fi and /2 can be equal to include the case 
that there is only one or zero node in set F. The minimum possible value of 
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p is 0 (explained in Remark [5]). The maximum possible value of p is min{p : 

1 u>iuq(p) < M}. For larger p, the sum of all wifrii will be less than M. In 
this case, all the nodes will be in set F and Pi = 1 Vi, which makes (l23l) - (l24l) a 
simple knapsack problem that can be easily solved. 

Additionally, since the dynamic program searches over all feasible integer val¬ 
ues, we use a simple rounding technique to guarantee it is implementable. Before 
the execution of SEQ(n, B, M, d, /i, f^, ind), we set rrii <— [^J, im <- for 
all i and B •<— [-jJ, M where <5 is an adjustable parameter. Intuitively, 

by making 8 and p step small enough, Algorithm [l] can find a strategy that is 
arbitrarily close to the subgame perfect equilibrium strategy of the defender. 
Formally, we can establish the following result. 


Lemma 11. For any e, if p s tep < 


have 


I c alQ \ 


< 1 


where C a i g is 


A ’ wc |C*| - 

the payoffs of the strategy found by Algorithm |7J C* is the optimal payoff and 

A = maxi {+ max ° K+ % >)• max *{^■ max{ 1 , + 




Proof. Denote p* as the optimal pd for computing C* and p' is the first pd which 
is greater than p* in Algorithm [T] Then we have 


(p* + n)wi + Cf (p' + ri)wi + cf 

< (p' - P*)rjWj < PstepTjWj 

~ [(p* + n)wi + Cf-) 2 ~ [(p*+r i )w i + Cf L } 2 


(31) 


In addition, we have (faai+ ^ )M ,.+ c A <m.< r . m [+c A for any p d and i. Now, we 
consider one particular solution for p' in which the set allocation is the same as 
the optimal solution of (l23l) but each rrii decrease Am*. Thus, the cost increases 
in two parts due to the decrease of mi. The first part is from those nodes in 
F and the seconde part comes from the extra budget for the attacker to attack 
those nodes in set D and E. Let rrii = mj(p*) and m' = mfp'). It follows that 


\C a ig\ - |C*| < Eiefuc A mfr l w i -Cf > ) + (£" = i A m iWi ) • max i { n(1 u|< ^ ) } 


\C*\ ~ C * 

riWi(riWi—C^ ) , /v^ n _ FiW“ 


E riWiyriWi— ~ r i w i \ f r*»(1— wimf) -j 

ieFUD Pstep [( p * +r ..) u ,. +C A]2 + lAi=l Pstep [( p * +Ti ) Wi +Cf ] 2 > ' max n W 4 rn$ J 


< 


l,EieFuD[ ri (l miWi) + iriiCP] Yi^FuDuE m( ^P + EieG r * 

. rriiWi{riWi — Cf) EEiCVd) 2 ' m axj{p max + n + 

< p step max{--) + 


C* 


Zi 


&FUD 


m. 


f (nwi - CP)wi/n Er=i {miWif/n ■ max,{^} 


r*C[> 


ZieFUDJE (p maI +ri)iOi+Cf EieG 

Pstep ( max{} + max{4 ( ' Pmax + Ci } ■ ma x{p max + r t + — }^) 

\ 1 E i i r i r i U i 1 Wi ) 

(32) 
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Further, according to the definition of p mQX in Algorithm [I] we have p m ax < 
nmmi{i~ii»i} Put ft j n an d we g e t 


\Cal g \ 


\c* 


\Calg\-\C*\ 

\C*\ 


< 1 + Pstep • A < 1 + e 


(33) 

□ 


Based on Lemma DU we have the following theorem 

Theorem 5. For any e, if p s tep < an< ^ ^ — W we ^ ave 1 < 1 + e where 
F = maxi{ 


ct 


/ n mini 
v M maxi{tUi} 


2\P \C*\ — 

< n mini{riit;i} 


1 \ -1 . / n mini \riWi\ . r . Cf* ^ \ 

H - —) r H - (-^7 -?—r H - m&x* \ ti — r) • 

Wi ' J ' M maxil^i) L L 1 Wi J' 


max >{| • maxi 1 ’ ( M m m n ax { a^} } +n)^h + %f>}} 

The proof is similar to that in Lemma[TIJ The only difference is the decrease of 
due to rounding. Adding the loss of rounding, we have Arm < [( p *+r‘)w?+c A \ 2 

Put it into (l33l) and we get ^0tr- < 1 + p s te P ■ A + 6 - F < 1 + e 

To hnd SEQ(n, B, M, d, /i, / 2 ,md), the dynamic program searches over all 
possible values of i £ {0,...,n}, b < B and m < M. Thus, the total time 
complexity for the dynamic programming is 0( n ^ M ), and the time complexity 
for Algorithm Q] is Q( ” 8 g M ) 

6 Numerical Result 


In this section, we present numerical results for our game models. For the illus¬ 
trations, we assume that all the attack times Wi are deterministic as in Sections[4] 
and [5] We study the payoffs of both attacker and defender and their strategies 
in both Nash Equilibrium and subgame perfect equilibrium in a two-node set¬ 
ting, and study the impact of various parameters including resource constraints 
B 1 M, and the unit value r,;. We further study the payoffs and strategies for 
both players in subgame perfect equilibrium in a five-node setting, and study 
the impact of various parameters. 

We first study the impact of the resource constraints M, B , and the unit 
value r\ on the payoffs for the two node setting in Figure [2j In the figure, we 
have plotted both Type 1 and Type 5 NE 0 and subgame perfect equilibrium. 
Type 5 NE only occurs when M is small as shown in Figure |2(a) while Type 
1 NE appears when B is small as shown in Figure |2(b)| which is expected 
since B is fully utilized in a Type 1 NE while M is fully utilized in a Type 5 
NE. When the defense budget B becomes large, the summation of rrii does not 
necessarily equal to B and thus Type 1 NE disappears. Similarly, the Type 5 
NE disappears for large attack budget M. In Figures 2(c) and |2(d)| we vary the 
unit value of node 1, r\. At the beginning, the defender protects node 2 only 
since W 2 > w\. As r\ becomes larger and larger, the defender starts to change 


There are also Type 2 NE, which are omitted for the sake of clarify. 
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its strategy by protecting node 1 instead of node 2 in NE Type 1. On the other 
hand, since node 1 is fully protected by the defender and the defender gives up 
defending node 2, the attacker begins to attack node 2 with probability 1, and 
uses the rest budget to attack node 1 with probability less than 1, due to the 
high defending frequency and limited resources M. We further observe that in 
both the simultaneous game and the sequential game, the value of mi increases 
along with the increase of rq, while the value of m 2 decreases at the same time. 
This implies that that the defender tends to protect the nodes with higher values 
more frequently. In addition, the subgame perfect equilibrium always bring the 
defender higher payoffs compared with Nash Equilibrium, which is expected. 




(a) Payoffs with varying M 


(b) Payoffs with varying B 



- Defender m 

in Type 1 NE 

_ Defender m 

in Type 1 NE 

— Attacker p { 

n Type 1 NE 

. . .Attacker p 

n Type 1 NE 

— Defender m 

in SPE 

..... Defender m 

in SPE 


0 .’’ 



(c) Payoffs with varying rq 


(d) Strategies with varying rq 


Fig. 2: The effects of varying resource constraints, where in all the figures, r 2 = 
1, uq = 1.7, w 2 = 1.6, Ci = 0.5, C 2 = 0.6, Ci = 1, C 2 = 1-5, and rq = 2 in (a) 
and (b), B = 0.3 in (a), (c), and (d), and M = 0.1 in (b), (c), and (d). 


Moreover, it it interesting to observe that under the Type 5 NE, the at¬ 
tacker’s payoff decreases for a larger M as shown in Fig |2(a)| This is because 
the defender’s budget B is not fully utilized in Type 5 NE, and the defender 
can use more budget to protect both nodes when M increases. The increase of 
the attacker’s payoff by having a larger M is canceled by the increase of the 
defender’s move frequency mi and m 2 . We also note that the Type 5 NE is less 
preferable for the defender in Figure-2(c) when r 1 is small and favors defender 
as rq increases, which tells us that the defender may prefer different types of 
NEs under different scenarios and so does the attacker. 


We then study the effects of varying M and rq on both players’ payoffs and 
strategies in the sequential game for the five-node setting. In Figure 3(a)| the 
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(a) Payoffs and strategies with varying M (b) Payoffs and strategies with varying n 

Fig. 3: The effects of varying resource constraints and rq, where io = [2 2 2 2 2], 
C D = C A = [1 1 1 1 1], B = 0.5, r = [5 4 3 2 1] in (a), r = [n 1 1 1 1] and 
M = 0.3 in (b). 


parameters of all the nodes are the same except rq. We vary the attacker’s budget 
M from 0 to 1. When M = 0, the defender can set rrii for all i to arbitrary small 
(but positive) values, so that the attacker is unable to attack any node, leading 
to a zero payoff for both players. As M becomes larger, the attacker’s payoff 
increases, while the defender’s payoff decreases, and the defender tends to defend 
the nodes with higher values more frequently, as shown in Figure [3(a)| )lower). 
After a certain point, the defender gives up some nodes and protects higher value 
nodes more often. This is because with a very large M, the attacker is able to 
attack all the nodes with high probability, so that defending all the nodes with 
small rrii is less effective than defending high value nodes with large rrii. This 
result implies that the attacker’s resource constraint has a significant impact 
on the defender’s behavior and when M is large, protecting high value nodes 
more frequently and giving up several low value nodes is more beneficial for the 
defender compared to defending all the nodes with low frequency. 

In Figure [3(b)| we vary rq while setting other parameters to be the same for 
all the nodes. Since all the nodes other than node 1 are identical, they have the 
same rrii as shown in Figure [3(b)] ) lower). We observe that the defender protects 
node 1 less frequently when rq is smaller than the unit value of other nodes. When 
rq becomes larger, the defender defends node 1 more frequently, which tells us 
the defender should protect the nodes with higher values more frequently in the 
subgame perfect equilibrium when all the other parameters are the same. 
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7 Conclusion 

In this paper, we propose a two-player non-zero-sum game for protecting a sys¬ 
tem of multiple components against a stealthy attacker where the defender’s 
behavior is fully observable, and both players have strict resource constraints. 

We prove that periodic defense and non-adaptive i.i.d. attack are a pair of best- 
response strategies with respect to each other. For this pair of strategies, we 
characterize the set of Nash Equilibria of the game, and show that there is al¬ 
ways one (and maybe more) equilibrium, for the case when the attack times 
are deterministic. We further study the sequential game where the defender first 
publicly announces its strategy, and design an algorithm that can identify a 
strategy that is arbitrarily close to the subgame perfect equilibrium strategy for 
the defender. 
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8 Appendix 

Proof of Lemma [3] In order to get the attacker’s best responses against any 
defender’s deterministic strategies, we first divide (U) into N sub-optimization 
problems 


mm 


E[min(a itk + w t , X ijfe )]r; + P(a i)fe < X^ k )C{ 


inV 

:,k ^ T 

fc =1 

t -^2 -E[min (a iik + Wi,X i>k )\ - E[m.m.(a^ k ,X^ k )] 


k=i 


T 


< Mi 


(34) 


where 1 Mi = M. Note that here we consider the equivalent minimization 
problem by taking the negative of the target function of d4j) and omitting the 
constant part. We further divide each sub-problem into Li sub-problems as fol¬ 
lows 


. E[ min(a i;fc + Wj, W,fc)]p + < X^ k )Cf 

aa,k T 

E[mm(a itk + Wi, X i k )] - E[mm(a itk , X itk )] 


(35) 


where ^2 k =i = d-f,. We claim that, the optimal solution to (|35l) is to allocate 

as much budget as possible to P(«i,fc = 0), that is 


a 


* 

i,k 


0 w.p. p* k 

> X i>k w.p. 1 - p* k 


(36) 
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where 

{ • /-1 MikT \ 

if ri(E[mm(wi, X itk )] - X i>k ) + Q 4 < 0 (37) 

0 if ri(E[min(wi, X iyk )\ - X i>k ) + Cf > 0 

The proof of the claim is provided below. Since M lk is any number such that 
Sfcii M »,fc = -W») the optimal solution of (1M1) also satisfies the same structure of 
(TSCJl) . A similar argument also applies to the optimal solution of (0]), although the 
optimal pi^ are not necessarily the same as in (1371) . We then prove our claim. For 
simplicity, we assume that ai tk is a discrete r.v., and without loss of generality, 
it has the following p.m.f 


0 w.p. p 0 

vi w.p. pi 


v n w.p. p n 

>X iyk w.p. 1 - 


(38) 


where n £ N and Vj £ R, j = 1, n, such that 0 < v\ < V 2 < ■ ■ ■ < v n < X lk . 
We note that the following proof can be adapted to the continuous as well 
by replacing sums with integrals and p.m.f with p.d.f. 

From the definition of we have 


F;[min(a ijfe +w il X i ^)\ 

n n 

= p 0 E[mm(wi, X ijk )\ + ^ pjE[mm(vj + Wi, X^ k )\ + (1 - ^ Pj)X ijk 
j= i j=o 

n n n 

= p 0 E[mm(wi, X itk )\ + '^2p j E[mm(w i , X ijk - Vj )] + (1 - y ^pj)X itk + ^ PjVj 
j =i j=o j =1 


Problem (1551) can then be converted to the following form 
min p 0 (ri[E[mm(wi, X i)k )\ - X ijk ] + Cf) 

n 

+ y^Pj( r i [E[mm(vj + uu, X ijfe )] - X i>k ) + Cf) + X^n 
j =i 

n 

s.t. p 0 E[mm(wi, X^ k )] + y^pjE[mm(wi, X i}k - t;,)] < 

i=i 

n 

Y.Pi ^ 1 

3=0 


(39) 


where we omit the constant T in the objective function for simplicity. Let 
J({po, ... ,p n }) denote the objective function in (13^1) . 
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Since r i (£’[min(u; i , X iik )\ - X ijk ) + Cf < rj(i? [min(t>j + Wi,X i}k )] - X i<k ) + 
Cf, if ri(E[m.m(wi,X itk )\ - X^ k ) + Cf > 0, J{{po,-,Pn}) is minimized by 
setting pj = 0, Vj = 0, which implies a^ k > X ik w.p.l. Such condition 
describes the case that even if the attacker attacks the node immediately after it 
is recovered, its reward is still less than 0. Therefore, the attacker never attacks. 
If ri(E\mva.(wi,Xi, k )\ — X i)k ) + Cf < 0, we claim that the optimal solution is 
to allocate as much budget M iik T as possible to po, that is, we set all p 3 = 0, 
1 <j<n, and p 0 = min(l, E [ min ( w k uXi k )] )- This is clearly true if n(E[mm(vj + 

Wi,Xi' k )] — Xi^k) + Cf > 0. Therefore, it suffices to consider the case when 
ri(E[m\n{wi, X iyk )\ - X itk ) + Cf < rj(i?[min(A,- + Wi,X i}k )) - X i}k ) + Cf < 0. 

To prove the claim, consider an optimal solution {po,pi, ...,p n } to (l39l) . We 
show that if po < min(l, E y m ^ff T x fc )i )> th en we can find another optimal solu¬ 
tion {p' 0 ,p'i, ■ ■■,p' n } such that p' 0 > po- We distinguish the following two cases: 


Case 1: poUfmin^, X ijk )] +J2]=i PjE[mm(wi, X i}k -Wj)] < M lik T. Then by the 
optimality of {po,pi, ...,p n } and the assumption that r,;(.E[min(i> ; + Wi, X^ k )\ — 
Xi t k) + Cf < 0, we must have Yfj=oPj = 1- Let j > 1 denote an index such 
that pj > 0. Then there must exist a small amount A p > 0 such that p’ 0 = 
Po + Ap,p' = p'j — A p,p' k = p k ,Vk f 0 and k f j is again a feasible solution 
to (l39l) . We further have 

J({p 0 , • • ■, Pn }) - T({pq, -,p' n }) 

= Ap(ri[E[mm(vj +Wi,X ijk )] - X^ k ] + Cf) 

- Ap(n[E[mm(wi,X^ k )\ - X iyk \ + Cf) 

= AprfE[imi\(vj +w i: X itk )\ - £ , [min(w i , X^ k )\) 

> 0 


Case 2: p 0 E[min(wi, X iik )\ + Yfj=i PjE[min(wi, X itk - Wj)] = Again let 

j > 1 denote an index such that pj > 0. Then there must exist a small amount 
AM > 0 such that p'o = Po + E[ m in('U>i,A' ii fc)) ’ P 'i = P 'j ’ p ' k = 

p k ,Vk f 0 and k f j is a feasible solution to (l39l) . We further have 

J({p 0 , ■-,Pn }) - J({p'o, p'n }) 

_ AM(ri[E[min(vj +Wi,X iyk )} - X iyk \ + Cf) 

E[min{wi, X^ k - Vj)} 

AM (ri[E[mm(wi, X itk )] - X iifc ] + Cf) 


E[mm(wi,X itk )\ 


AM 


(nvj - nX iik + cf) 


E[mm(wi, X iik - Vj)\ 

AM / v ^ 

tj\ ■ ( y- \l ( r iXi,k + C 3 ) 


> 0 


□ 










